A new soft subspace clustering algorithm was proposed to address the optimization problem for the projected subspaces, which was generally not considered in most of the existing soft subspace clustering algorithms. Maximizing the deviation of feature weights was proposed as the sub-space optimization goal, and a quantitative formula was presented. Based on the above, a new optimization objective function was designed which aimed at minimizing the within-cluster compactness while optimizing the soft subspace associated with each cluster. A new expression for feature-weight computation was mathematically derived, with which the new clustering algorithm was defined based on the framework of the classical k-means. The experimental results show that the proposed method significantly reduces the probability of trapping in local optimum prematurely and improves the stability of clustering results. And it has good performance and clustering efficiency, which is suitable for high-dimensional data cluster analysis.
Traditional n-gram feature extraction tends to produce a high-dimensional feature vector. High-dimensional data not only increases the difficulty of classification, but also increases the classification time. Aiming at this problem, this paper presented a feature extraction method based on Part-of-Speech (POS) tagging sequences. The principle of this method was to use POS sequences as text features to reduce feature dimension, according to the property that POS sequences can represent a kind of text.In the experiment,compared with the n-gram feature extraction, the feature extraction based on POS sequences at least improved the classification accuracy of 9% and reduced the dimension of 4816. The experimental results show that the method is suitable for emotion classification in micro blog.